Data used in biostatistics are often collected in online databases, but some data are still collected on
paper. Regardless of the source of the data, they must be put into electronic format and arranged in a
certain way to be able to be analyzed using statistical software. Chapter 8 is devoted to describing
how to get your data into the computer and arrange it properly so it can be analyzed correctly. It also
describes how to collect and validate your data. Then in Chapter 9, we show you how to summarize
each type of data and display it graphically. We explain how to make bar charts, box-and-whiskers
charts, and more.
Drawing Conclusions from Your Data
Most statistical analysis involves inferring, or drawing conclusions about the population at large
based on your observations of a sample drawn from that population. The theory of statistical
inference is often divided into two broad sub-theories: estimation theory and decision theory.
Statistical estimation theory
Chapter 10 deals with statistical estimation theory, which addresses the question of how accurately
and precisely you can estimate a population parameter from the values you observe in your sample.
For example, you may want to estimate the mean blood hemoglobin concentration in adults with Type
II diabetes, or the true correlation coefficient between body weight and height in certain pediatric
populations. Chapter 10 describes how to estimate these parameters by constructing a confidence
interval around your estimate. The confidence interval is the range that is likely to include the true
population parameter, which provides an idea of the precision of your estimate.
Statistical decision theory
Much of the rest of this book deals with statistical decision theory, which is how to decide whether
some effect you’ve observed in your data reflects a real difference or association in the background
population or is merely the result of random fluctuations in your data or sampling. If you measure the
mean blood hemoglobin concentration in two different samples of adults with Type II diabetes, you
will likely get a different number. But does this difference reflect a real difference between the groups
in terms of blood hemoglobin concentration? Or is this difference a result of random fluctuations?
Statistical decision theory helps you decide.
In Part 4, we cover statistical decision theory in terms of comparing means and proportions between
groups, as well as understanding the relationship between two or more variables.
Comparing groups
In Part 4, we show you different ways to compare groups statistically.
In Chapter 11, you see how to compare average values between two or more groups by using t
tests and ANOVAs. We also describe their nonparametric counterparts that can be used with
skewed or other non-normally distributed data.
Chapter 12 shows how to compare proportions between two or more groups, such as the
proportions of patients responding to two different drugs, using the chi-square and Fisher Exact
tests on cross-tabulated (cross-tab) data.
Chapter 13 focuses on one specific kind of cross-tab called the fourfold table, which has exactly